Storing and loading data in an array-based computing environment

ABSTRACT

A schema that enables a user to store data generated in an array-based computing environment in a scientific data file is disclosed. The schema may provide a mapping between the data types of the array-based computing environment and the data types of the scientific data file format. The schema may also apply when data is loaded from the scientific data file into the array-based computing environment. When the data is stored in the scientific data file, the file contains descriptions of the data as the variables of the array-based computing environment so that the data in the file can be loaded into the array-based computing environment without additional user input. The loaded variables (name and value) are identical to their state before the data is stored in the file.

FIELD OF THE INVENTION

The present invention relates generally to an array-based computingenvironment and more particularly to a method, system, and mediums forstoring and loading data in the array-based computing environment.

BACKGROUND OF THE INVENTION

MATLAB® from The MathWorks, Inc. of Natick, Mass. provides a technicalcomputing environment. MATLAB® provides an array-based computingenvironment in which a workspace consisting of a set of named arrays(variables) is built up during a MATLAB® session and stored in memory.In the conventional MATLAB® environment, the data generated in theworkspace is stored in a MAT file format. A MAT file stores data inbinary form. When a user creates a MAT file, the arrays in the workspaceare saved in the MAT file as a continuous byte stream.

Since a MAT file stores data in binary form, it may require a large sizeof storage to save a large size of data, such as scientific data. Ascientific data file format has been developed to save a large size ofscientific data. An example of the scientific data file format can befound in Hierarchical Data Format, Version 5 (HDF5) file format. TheHDF5 file format is a general purpose format for scientific data,supported by public domain code and documentation. It would be desirableto be able to store the data generated in the MATLAB® environment usinga scientific data file format, such as HDF5.

SUMMARY OF THE INVENTION

The illustrative embodiment of the present invention provides a schemathat enables a user to store in a scientific data file format the datagenerated and used in an array-based computing environment. The schemamay provide a mapping between the data types of the array-basedcomputing environment and the data types of the scientific data fileformat. The schema may apply when the data stored in the scientific datafile is loaded into the array-based computing environment. When the datais stored in a container or repository, such as a file, a database,memory and storage, with a scientific data file format, the container orrepository contains descriptions of the data as the variables of thearray-based computing environment so that the data in the container orrepository can be loaded into the array-based computing environmentwithout additional user input and the loaded variables (name and value)are identical to their state when they are stored in the container orrepository.

In one aspect of the present invention, a method is provided for storingor loading data in an array-based computing environment. Data isgenerated in a workspace of the array-based computing environment. Thedata is stored in a Hierarchical Data Format, Version 5 (HDF5) file,wherein types of the data in the array-based computing environment areautomatically mapped to corresponding data types of the HDF5 file. Thedata of the HDF5 file may be loaded into the array-based computingenvironment.

In another aspect of the present invention, a system is provided forstoring or loading data in an array-based computing environment. Thesystem includes a workspace for containing data generated in thearray-based computing environment. The system also includes a storageunit for storing the data in a Hierarchical Data Format, Version 5(HDF5) file. The system further includes a schema for mapping the typesof the data in the workspace to corresponding data types of the HDF5file.

In another aspect of the present invention, a computer program productholding instructions executable in a computer is provided for storing orloading data in an array-based computing environment. Data is generatedin a workspace of the array-based computing environment. The data isstored in a Hierarchical Data Format, Version 5 (HDF5) file, whereintypes of the data in the array-based computing environment areautomatically mapped to corresponding data types of the HDF5 file. Thedata of the HDF5 file may be loaded into the array-based computingenvironment.

BRIEF DESCRIPTION OF THE DRAWINGS

The aforementioned features and advantages, and other features andaspects of the present invention, will become better understood withregard to the following description and accompanying drawings, wherein:

FIG. 1 depicts an exemplary system suitable for practicing theillustrative embodiment of the present invention;

FIG. 2 shows an exemplary computing device for implementing theillustrative embodiment of the present invention;

FIG. 3 is an exemplary network environment that enables an onlineimplementation of the present invention;

FIG. 4A shows a detailed configuration of the computing environmentdepicted in FIG. 1;

FIG. 4B shows an exemplary sparse array;

FIG. 5 is a flow chart showing an exemplary operation for storing datain a HDF5 file format; and

FIG. 6 is a flow chart showing an exemplary operation for loading datafrom the HDF5 file format into the computing environment.

DETAILED DESCRIPTION

The illustrative embodiment of the present invention provides a mappingbetween data types of an array-based computing language and data typesof a scientific data format, such as a Hierarchical Data Format, Version5 (HDF5). The illustrative embodiment provides a schema for storing datafrom the workspace of the array-based computing language into a file inHDF5. The schema of the illustrative embodiment may also apply to thedata stored in the HDF5 file when the data is loaded into the workspaceof the array-based computing language. The embodiment of the presentinvention will be described below only for illustrative purposesrelative to MATLAB®. Although the illustrative embodiment is describedrelative to MATLAB®, those of skill in the art will appreciate that thepresent invention may be practiced in other computing or programmingenvironments. Those of skill in the art will also appreciate that theschema may be modified to use other scientific file formats, such as CDF(Common Data Format), FITS (Flexible Image Transport System), GRIB (GRidIn Binary), NetCDF (Network Common Data Form), etc. Those of skill inthe art will further appreciate that although the illustrativeembodiment is described relative to a file as a container of the data ina scientific data format, the data can be stored in different types ofcontainers or repositories, such as a database, memory and storage, inother embodiments.

FIG. 1 is an exemplary system 2 suitable for practicing the illustrativeembodiment of the present invention. The system 2 may include anarray-based computing environment 4 and a scientific data format file 8for storing data in the workspace 6. An exemplary array-based computingenvironment 4 can be provided by MATLAB® from The MathWorks, Inc. ofNatick, Mass. MATLAB® is an intuitive language and provides a technicalcomputing environment. The MATLAB® environment integrates mathematicalcomputing, visualization, and a powerful technical language. MATLAB®provides core mathematics and advanced graphical tools for dataanalysis, visualization, and algorithm and application development.MATLAB® provides a range of computing tasks in engineering and science,from data acquisition and analysis to application development. Built-ininterfaces of MATLAB® enable users to access and import data frominstruments, files, and external databases and programs. In addition,MATLAB® enables the users to integrate external routines written in C,C++, Fortran, and Java with the MATLAB® applications.

MATLAB® supports dynamically typed programming. In a dynamically typedprogramming environment, types are assigned to each data value in memoryat runtime, rather than assigning a type to a static, syntactic entityin the program source code. The dynamically typed programmingenvironment catches errors related to the misuse of values at the timethe erroneous statement or expression is executed. In contrast, typesare assigned to sets of values based on the program's source code in astatically typed programming environment. Static type disciplinesoperate on program source code rather than on the program execution.Therefore, in the statically typed programming environment, certainkinds of errors are detected without executing the program.

The computing environment 4 may include a workspace 6 for containingdata generated and used in the computing environment 4. In theillustrative embodiment, the workspace 6 refers to memory space for thenames and values of any variables used in the current MATLAB® session.The workspace 6 can be named and hierarchical, so that one workspace mayinclude a variable that is a workspace itself. A variable is a symbolused to contain a value. A variable may be a scalar variable or arrayvariable. A scalar variable is a variable that contains a single number.MATLAB® enables a user to handle a collection of numbers (“array”) as asingle variable. A user may use variables to write expressions. Thefollowing MATLAB® session is an example to compute b=sin(a) for a=0,0.1, 0.2, . . . , 10.

>>a=[0:0.1:10];

>>b=sin(a);

In the example, the second line computes b=sin(a) 101 times for eachvalue in the array a to produce an array b that has 101 values. Thearrays a and b are contained in the workspace 6. The data provided inthe work space may be time series data encapsulated in an object. Timeseries data is a sequence of data measured at succesive times spacedapart at time intervals. The time series data maybe encapsulated in atime series object or time series collection object. The time seriesobject encapsulates the time, data and metadata within a single object.The time series collection object stores one or more time series objectswith different sequences of time series data. The times series objectand time series collection object are described in more detail inco-pending U.S. patent application Ser. No. 11/475,320 (Title: ANALYSISOF A SEQUENCE OF DATA IN OBJECT-ORIENTED ENVIRONMENT), the content ofwhich is incorporated by reference.

The system 2 may also include a scientific data format file 8 forstoring data in the workspace 6. The data stored in the file 8 may beloaded into the workspace. In the illustrative embodiment, the file 8 isprovided externally to the computing environment 4. Those of ordinaryskill in the art will appreciate that the system 2 depicted in FIG. 1 isillustrative and the file 8 may be provided internally to the computingenvironment 4 in other embodiments.

Data may be stored in the file 8 using HDF5. The HDF5 format is ageneral purpose format for scientific data and supported by publicdomain code and documentation. HDF5 is designed to store data of a largesize, for example in a file. HDF5 files can contain data and metadata.HDF5 files organize the data and metadata, called attributes, in ahierarchical structure, similar to the hierarchical structure of a filesystem. In an HDF5 file, the directories in the hierarchy are calledgroups. A group can contain other groups, datasets, attributes, links,and data types.

A dataset is a collection of data, such as a multidimensional numericarray or string. A dataset includes a header and a data array. Theheader contains information on the array portion of the dataset. Headerinformation includes the name of the object, data space, data type,information about how the data is stored on disk, and other information.

Data types are a description of the data in the dataset or attribute.Data types give information on how to interpret the data in the dataset.In HDF5, there are two categories of data types: atomic data types andcompound data types. Each atomic data type belongs to a particular classand has several properties: size, order, precision, and offset. Atomicclasses include integer, float, date and time, string, bit field, andopaque. Properties of integer types include size, order (endian-ness),and signed-ness (signed/unsigned). Properties of float types include thesize and location of the exponent and mantissa, and the location of thesign bit. A compound data type is one in which a collection of severaldata types are represented as a single unit, a compound data type,similar to a struct in C. The parts of a compound data type are calledmembers. The members of a compound data type may be of any data type,including another compound data type.

A data space describes the dimensionality of the dataset. The dimensionsof a dataset can be fixed (unchanging), or they may be unlimited, whichmeans that they are extendible (i.e. they can grow larger). Propertiesof a data space consist of the rank (number of dimensions) of the dataarray, the actual sizes of the dimensions of the array, and the maximumsizes of the dimensions of the array. For a fixed-dimension dataset, theactual size is the same as the maximum size of a dimension.

An attribute is any data that is associated with another entity.Attributes are small named datasets that are attached to primarydatasets, groups, or named data types. Attributes can be used todescribe the nature and/or the intended usage of a dataset or group. Anattribute has two parts: (1) a name and (2) a value. The value partcontains one or more data entries of the same data type.

A link is similar to a UNIX file system symbolic link. Links are a wayto reference data without having to make a copy of the data.

One of ordinary skill in the art will also appreciate that the computingenvironment 4 and the storage may be provided on the same device, whichwill be described below in more detail with reference to FIG. 2, oralternatively, the computing environment 4 and the storage may beprovided in a network environment, which will be described below in moredetail with reference to FIG. 3.

FIG. 2 is an exemplary computing device 10 suitable for practicing theillustrative embodiment of the present invention. One of ordinary skillin the art will appreciate that the computing device 10 is intended tobe illustrative and not limiting of the present invention. The computingdevice 10 may take many forms, including but not limited to aworkstation, server, network computer, quantum computer, opticalcomputer, bio computer, Internet appliance, mobile device, a pager, atablet computer, and the like.

The computing device 10 may be electronic and include an execution unit11, memory 12, storage 13, an input control 14, a modem 15, a networkinterface 16, a display 17, etc. The execution unit 11 controls eachcomponent of the computing device 10 to provide the computingenvironment 4 and the file 8. The memory 12 temporarily storesinstructions and data and provides them to the execution unit 11 so thatthe execution unit 11 operates the computing device 10.

Optionally, the computing device 10 may include multiple CentralProcessing Units (CPUs) 11 a and 11 d for executing software loaded inthe memory 12, and other programs for controlling system hardware. Eachof the CPUs 11 a and 11 d can include a single core or multiple cores 11b and 11 c. The code loaded in the memory 12 may run in a virtualizedenvironment, such as in a Virtual Machine (VM). Multiple VMs may beresident on a single processor. Also, part of the application could berun in hardware 11 e, for example, by configuring a field programmablegate array (FPGA) or creating an application specific instructionprocessor (ASIP) or an application specific integrated circuit (ASIC).

The storage 13 may contain software tools for applications. The storage13 may include, in particular, code 20 for the operating system (OS) ofthe device 10, code 21 for applications running on the operation systemincluding the computing environment 6, and data 22 for the file 8. Thoseof ordinary skill in the art will appreciate that the application can bestored in the memory 12 as well, much like the data, and even the OS, orthey can be stored on the network described below with reference to FIG.3.

The input control 14 may interface with a keyboard 18, a mouse 19, andother input devices. The computing device 10 may receive through theinput control 14 input data, such as the input commands in the MATLAB®session. The computing device 10 may display on the display 17 the datagenerated or used in the MATLAB® session. The computing device 10 mayalso display user interfaces that enable a user to save data in the HDF5file or to load data from the HDF5 file.

FIG. 3 is an exemplary network environment 24 suitable for thedistributed implementation of the illustrative embodiment. The networkenvironment 24 may include one or more servers 26 and 29 coupled toclients 27 and 28 via a communication network 25. The network interface16 and the modem 15 of the computing device 10 enable the servers 26 and29 to communicate with the clients 27 and 28 through the communicationnetwork 25. The communication network 25 may include Internet, intranet,LAN (Local Area Network), WAN (Wide Area Network), MAN (MetropolitanArea Network), wireless network (e.g., using IEEE 802.11 and Bluetooth),etc. The communication facilities can support the distributedimplementations of the present invention.

In the network environment 24, the client 28 may run a MATLAB® sessionand generate data in the workspace 6. The client 28 may send the data tothe server 26 for storage. The server 26 may include a storage unit forstoring the data in an HDF5 file. In response to the client's requestfor loading data from the server 28, the server 26 may send the data inthe HDF5 file to the client 28 or a different client 27. The client 27or 28 may load the data from the server into the workspace with the samestate as the data before the data is saved to the server. In anotherembodiment, the workspace and storage may reside in the servers 26 and29, respectively, and be coupled to each other through the communicationnetwork 25. The servers 26 and 29 may communicate with each otherthrough the communication network 25.

In the networked environment, multiple computing environments maycooperate to share data between them, and to coordinate on saving thatdata. For example, the clients 28 and 29 may run separate computingenvironments and share data using, for example, distributed arrays.Distributed arrays may be distributed across the clients 28 and 29 andeach computing environment of the clients 28 and 29 handles calculationon a portion of the distributed arrays. The data in the distributedarrays may be stored in a HDF5 file. Distributed arrays are described indetail in U.S. patent application Ser. No. 10/940,152 filed on Sep. 13,2004, entitled “METHODS AND SYSTEM FOR EXECUTING A PROGRAM IN MULTIPLEEXECUTION ENVIRONMENTS,” the content of which is incorporated byreference.

FIG. 4 shows a detailed configuration of the computing environment 4depicted in FIG. 1. The computing environment includes schema 31 formapping between the MATLAB® data types and the HDF5 data types. Theschema 31 may enable data in the workspace 6 to be stored in the HDF5file 8 and the data in the HDF5 file 8 to be loaded in the MATLAB®workspace 6. When the data is stored in the HDF5 file 8, the HDF5 file 8also contains descriptions of the data as MATLAB® variables so that thedata in the file 8 can be loaded into MATLAB® workspace 6 withoutadditional user input. The schema 31 specifies how to store data in anHDF5 file in such a manner that when the HDF5 file 8 is loaded into thecomputing environment 4, the loaded variables (name and value) areidentical to their state on the saving machine. The followings areexemplary schema 31 for mapping between the MATLAB® data types and theHDF5 data types.

Names

In the illustrative embodiment, the schema 31 maps MATLAB® names (namesof variables, and fields of structs) to the identical HDF5 names becausethe set of legal HDF5 names is a superset of the set of legal MATLAB®names. The schema 31 can use HDF5 names that are not legal MATLAB® namesfor specialized data, without colliding with a MATLAB® variable.

Variables

The illustrative embodiment maps each saved MATLAB® variable to animmediate child group or dataset of the HDF5 root group in the hierarchyof the HDF5 file format. Each immediate child group of the HDF5 rootgroup, where it has a name that is identical to a legal MATLAB® variablename, contains a saved MATLAB® variable.

MATLAB® Class

Every HDF5 dataset or group, which corresponds to a well-formed MATLAB®value, has an HDF5 attribute describing its MATLAB® class. In theillustrative embodiment, the attribute name is MATLAB_class, thedatatype of the attribute is a fixed length string, the dataspace of theattribute is scalar, and the data is the MATLAB® class name. The MATLAB®class name may be one of the reserved names for built-in MATLAB® classes(e.g. double, float, char, etc.) or it may name a user-defined class.The term “well-formed MATLAB® value” is used to refer not only to thevalue of variables, but to any value available to the MATLAB® user, suchas the contents of a cell of a cellarray, or a field of a struct.

Global Attribute

The HDF5 dataset or group corresponding to a global MATLAB® variable ismarked as global by the presence of an HDF5 attribute namedMATLAB_global. The datatype and dataspace of the attribute are ignored.The illustrative embodiment uses a scalar dataspace of a single 1-byteinteger, but this attribute may indicate that the variable is global bythe presence of the attribute.

Dimensions and Storage Order

For non-empty MATLAB® values, the MATLAB® dimensions may correspond tothe dataspace dimensions in reverse order. The storage order of the datafrom the MATLAB® variable in memory may correspond directly to thestorage order of the data in the HDF5 file. MATLAB® uses FORTRAN-styleindexing where the first index varies most rapidly. HDF5 uses C-styleindexing where the last index varies most rapidly. In order to preservethe linear order of elements (most important for performance) thedimensions are reversed. For example, a MATLAB® array A(2,3) having theorder of elements a(0,0), a(1,0), a(0,1), a(1,1), a(2,0) and a(2,1) isstored as an HDF array A(3,2) having the order of elements a(0,0),a(0,1), a(1,0), a(1,1), a(2,0) and a(2,1).

Endian-Agnostic

The endian-ness of data is a property of the HDF5 datatype associatedwith each dataset. When data elements are written in little orbig-endian representation, either format may be chosen. The little orbig-endian refer to which bytes are most significant in multi-byte datatypes and describe the order in which a sequence of bytes is stored in acomputer memory. In a big-endian system, the most significant value inthe sequence is stored at the lowest storage address (i.e., first). In alittle-endian system, the least significant value in the sequence isstored first. The platforms of the MATLAB® environment 4 may write datain their native format, and convert, if necessary, on reading. Manycomputers, particularly IBM mainframes, use a big-endian architecture.Other computers, including PCs, use the little-endian system. The bitordering within each byte can also be big- or little-endian, and somearchitectures actually use big-endian ordering for bits andlittle-endian ordering for bytes, or vice versa. The computingenvironment 4 may be adapted for the endian-ness of the platform onwhich the computing environment 4 is provided. The data in a MAT filehas an endian-ness. The computing environment 4 allows a user not tocare about the match or mismatch between the platform's endian-ness andthe file data endian-ness. When storing data in the HDF5 datatype, thecomputing environment 4 may describe the endian-ness of the data in eachdataset in the HDF5 file 8.

Datatype for Dimensions and Indices

MATLAB® indices and dimensions are written as unsigned 32, or a greaterpower of 2, bit integers. Analogous to endian-ness of data, theplatforms of the computing environment 4 may write indices in theirnative length. At the time of reading the data, the platforms may checkthe datatype, and prepare for the possibility of overflow converting totheir native datatype. Indices are explicitly written as data in thecase of sparse matrices. Dimensions are explicitly written as data inthe case of empty matrices and sparse matrices. The empty matrices andsparse matrices will be described below in more detail.

Full Doubles and Singles

Non-empty double and single-precision matrices are written as a datasetof IEEE standard floating point double and single datatypes in the HDF5file.

Complex

If the MATLAB® double or single is complex, the HDF5 datatype will be acomplex datatype with fields real and imag. The types of both fields maybe the corresponding (double or single) IEEE standard floating pointnumber. By placing real and imaginary parts of a single element besideeach other, this amounts to interlaced storage.

Integer Classes

MATLAB® integer classes are written as datasets of the correspondingHDF5 integer datatype.

Disambiguating Integer Storage

In cases where non-integer MATLAB® types are stored as integer HDF5datatypes, an attribute named MATLAB_int_decode may be added to thedataset. In the illustrative embodiment, the dataspace is scalar. Thedatatype of the attribute is any integer suitable to represent the Cenum:

typedef enum {NO_INT_HINT=0, LOGICAL_HINT, UTF16_HINT}integerDecodingHint;

The attribute may be optional if its value is NO_INT_HINT.

Logicals

MATLAB® logicals may be stored as a numeric array of UINT8s. Becausethis is ambiguous without additional information, the presence of theMATLAB_int_decode attribute, as described above, with value LOGICAL_HINTmay be necessary. Without the MATLAB_int_decode attribute, this may bean UINT8 array.

Characters

MATLAB® chars may be stored as a numeric array of UINT16s, representingthe MATLAB® native UTF-16 character encoding. Because this is ambiguouswithout additional information, the presence of the MATLAB_int_decodeattribute, as described above, with value UTF16_HINT may be necessary.The characters may be stored in Unicode, without conversion to localcode page. In the illustrative embodiment, HDF5 only provides an ASCIIcharacter datatype.

Sparse Array

Sparse arrays (of all sparse types) are stored as a group containing (inthe general case) three (3) datasets. The three datasets correspond toMATLAB®'s internal representation of a sparse array. Their names aredata, ir and jc. All three datasets may have a one-dimensionaldataspace. FIG. 4B shows an exemplary sparse matrix 41. The datatype ofthe data dataset 43 may be the same as the corresponding non-sparserepresentation. The ir data set 45 and the jc dataset 47 contain row andcolumn indices, respectively. A sparse matrix with all zero elements maynot have the data and jc groups, only the ir group. In order todisambiguate a sparse matrix from the (admittedly pathological) case ofa scalar struct with three fields coincidentally named “data”, “ir”, and“jc”, all groups representing sparse matrices may have an attributenamed MATLAB_sparse. Because the number of rows of a sparse matrix isnot represented in the data, ir, or jc components, it may be stored asthe value of the MATLAB_sparse attribute. Because this is a dimension,its datatype is also that of a MATLAB® dimension.

In an embodiment, the sparse matrix may be represented in canonicalform, such as controller canonical form and observer canonical form, tostore its data in HDF5. Since the canonical form diagonalizes the sparsematrix by a similarity transformation, the sparse matrix can be storedin HDF5 using a single dataset.

Empties

Because HDF5 dataspaces must have positive dimensions, but MATLAB®allows matrices with any number of zero dimensions, the representationof MATLAB® empty matrices may be different from that of non-emptymatrices. All empties may be represented as a dataset with attributeMATLAB_empty. The dataset may have a one-dimensional dataspace, and thedata may contain the dimensions of the matrix, in MATLAB® order.

Cellarrays

In the illustrative embodiment, a cell array is used to refer to anarray whose elements can hold any arbitrary data type, including structsor other cell arrays. A user can store arrays of different types and/orsizes within the cells of a cell array. For example, a user can store a1-by-50 char array, a 7-by-13 double array, and a 1-by-1 uint32 in cellsof the same cell array. A cell array is stored as a dataset of HDF5references. The references may point to child nodes of the top levelgroup named #refs#. Each element of the cell array is stored in one ofthose child nodes of #refs#. This permits, though does not require,sharing of data by having multiple references point at the same place.#refs# does not correspond to any named MATLAB® variable, nor do itschild nodes. The child nodes do correspond to values, typically elementsof a cellarray or struct, but their HDF5 names are for reference withinthe file.

Structs

A struct is an array whose elements are indexed by names (called“fields”), as well as by numerical indices. Like cell arrays, theelements of a struct can be any data type, including other structs. Forexample, one field may contain a text string representing a name,another may contain a scalar representing a billing amount, a third mayhold a matrix of medical test results, and so on. A struct is stored asa group. Each field of the struct corresponds to, and has the same nameas, a child of the group. In the case of non-scalar structs, each childof the group may be a dataset of references, in the same manner as thecellarray described above. In order to preserve order of fields, whenthere is more than one field, the set of field names is stored, inorder, as an attribute named MATLAB_fields. The MATLAB_fields attributeis optional in the case of a single field. In the case of an emptystruct, the rule for empties may take precedence, in determining storageas a dataset, but the field names attribute will still be present.

Scalar Structs

If the struct is scalar, the HDF5 representation may be optimized byskipping the dataset of references. In this case, the contents of eachchild of the group correspond directly to the corresponding fieldvalues.

Storage of Objects Requiring Serialization or Marshaling

In cases where MATLAB® object types are stored by converting them tonon-object types, an attribute named MATLAB_object_decode may be addedto the group or dataset. In the illustrative embodiment, the dataspaceis scalar. The datatype of the attribute may be any integer suitable torepresent the C enum:

typedef enum {NO_OBJ_HINT=0, FUNCTION_HINT, OBJECT_HINT, OPAQUE_HINT}objectDecodingHint;

MATLAB® Object Oriented Programming System (OOPS) Objects

MATLAB® OOPS objects are stored as structs. Because this is ambiguouswithout additional information, the presence of the MATLAB_object_decodeattribute, as described above, with value OBJECT_HINT may be necessary.

Opaque Classes

MATLAB® opaque classes are stored by first converting them to anon-opaque type, plus additional information to be stored in thesubsystem data, then secondly storing the non-opaque data as describedabove. The first stage, conversion to a non-opaque type, takes place asa MATLAB-to-MATLAB transformation. The first stage is thereforeindependent of the choice of file format, and could be applied to anymarshalling or serialization method. Because the non-opaque result ofthe first stage is ambiguous without additional information, thepresence of the MATLAB_object_decode attribute, as described above, withvalue OPAQUE_HINT may be necessary in the second stage to indicate thatthe conversion should be reversed at read time. The reconstruction of anopaque type requires additional information at read time. Thisadditional information, called the subsystem data, may be stored under atop level group named #subsystem#.

Function Handles

A function handle refers to a value and data type that provides a meansof calling a function indirectly. A user can pass function handles incalls to other functions (often called function functions). A user canalso store function handles in data structures for later use (forexample, as Handle Graphics® callbacks). MATLAB® function handles arestored by converting them to a non-opaque type, plus additionalinformation to be stored in the subsystem data. Because the non-opaqueresult is ambiguous without additional information, the presence of theMATLAB_object_decode attribute, as described above, with valueFUNCTION_HINT may be necessary to indicate that the conversion should bereversed at read time.

Referring back to FIG. 4, the computing environment 4 may provide anapplication program interface (API) 33 for creating, opening, andclosing HDF5 file 8. The API 33 may also enable a user to create andwrite groups, datasets, and their attributes to the HDF5 file 8. Withthe API 33, a user may remove the datatype, dataspace and datasetobjects separately from the HDF5 file 8. Using the API 33, a user canread groups, datasets and their attributes from HDF5 files. By readingdata from the HDF5 files, a user may obtain information about a dataset,such as the datatype associated with a dataset, and dataspaceinformation. The API 33 may enable a user to access portions (orselections) of a dataset.

The computing environment 4 may also provide a user interface 35 thatenables a user to store data from the MATLAB® workspace 6 into the HDF5file 8 and to load data from the HDF5 file 8 into the MATLAB® workspace6. For example, the user may enter the following command to save andload data (steps 51 and 61 in FIGS. 5 and 6).

>>save-v.(version number)

>>load-v.(version number)

In response to the input from the user, the computing environment 4 maymap between the data types of the computing environment and the datatypes of the HDF5 based on the schema 31 described above 8 when itsproduct version (version number) supports HDF5 file format (steps 53 and63 in FIGS. 5 and 6). The computing environment then stores data in theHDF5 file 8 and load data from the HDF5 file (steps 55 and 65 in FIGS. 5and 6). When the data is stored in the HDF5 file 8, the data may beencrypted. When the product version (version number) does not supportHDF5 file format, the computing environment 4 may store data from theworkspace 6 into other data file formats, such as a legacy MAT fileformat, that the product version supports.

In the illustrative embodiment, the computing environment 4 may providea user interface 35 that enables a user to define or modify the mappingbetween the data types of the computing environment 4 and the data typesof the HDF5. The illustrative embodiment may provide multiple schemas 31that can map between the data types of the computing environment to thedata types of different scientific data formats, such as CDF (CommonData Format), FITS (Flexible Image Transport System), GRIB (GRid InBinary), NetCDF (Network Common Data Form), etc. The computingenvironment 4 may provide a user interface 35 that enables a user toselect one or more of the multiple schemas 31.

Those of skill in the art will appreciate that a spreadsheet can betreated as an array and the illustrative embodiment of the presentinvention may be practiced with spreadsheets.

Certain embodiments of the present invention are described above. It is,however, expressly noted that the present invention is not limited tothese embodiments, but rather the intention is that additions andmodifications to what is expressly described herein also are includedwithin the scope of the invention. Since certain changes may be madewithout departing from the scope of the present invention, it isintended that all matter contained in the above description or shown inthe accompanying drawings be interpreted as illustrative and not in aliteral sense. Practitioners of the art will realize that the sequenceof steps and architectures depicted in the figures may be alteredwithout departing from the scope of the present invention and that theillustrations contained herein are singular examples of a multitude ofpossible depictions of the present invention.

1. A computer-implemented method for representing data from anarray-based computing language, the method comprising the steps of:providing data in the array based computing language; mapping types ofthe data in the array based computing language to a scientific dataformat; and allowing for sharing of values of fields in the scientificdata format.
 2. The method of claim 1, further comprising the step of:storing data in a scientific data format.
 3. The method of claim 2,wherein the data is stored in a repository including at least one of afile, a database, memory and storage.
 4. The method of claim 3 whereinthe file has a .MAT extension.
 5. The method of claim 3, wherein thedata is encrypted when stored in the repository.
 6. The method of claim1, wherein the data comprises time series data encapsulated in anobject.
 7. The method of claim 1, wherein at least a subset of thearray-based computing language comprises MATLAB®-compatible commands. 8.The method of claim 1, wherein the scientific data format file comprisesa Hierarchical Data Format, Version 5 (HDF5) file.
 9. The method ofclaim 1, wherein the data comprises variables having a propertyspecifying a format for storing the variables.
 10. The method of claim1, further comprising the step of: loading data from an HDF file createdfrom data in the array-based computing language into a workspace of acomputing environment.
 11. The method of claim 10, wherein a portion ofthe data locked for reading or writing, wherein the locked portion isspecific to the data that are stored or loaded.
 12. The method of claim10, wherein more than one read process read different portions of thedata and builds an in-memory representation of the data.
 13. The methodof claim 10, wherein a state of the loaded data is substantially thesame as a state of the data in the workspace before the data is storedin the scientific data format file.
 14. The method of claim 1, whereinthe step of mapping comprises the step of: reversing a dimension of thedata to store the data in the scientific data format file.
 15. Themethod of claim 1, wherein the step of mapping comprises the step of:determining whether the data is to be stored in a little or big endianrepresentation.
 16. The method of claim 1, wherein the step of mappingcomprises the step of: mapping a sparse array as a group in thescientific data format.
 17. The method of claim 16, wherein the sparsearray is mapped as three datasets, each dataset having one dimensionaldataspace.
 18. The method of claim 16, wherein the sparse array isrepresented in canonical form and stored as one dataset having onedimensional dataspace.
 19. The method of claim 1, wherein the step ofmapping comprises the step of: mapping an empty array as a dataset withan attribute indicating that the array is empty.
 20. The method of claim1, wherein the step of mapping comprises the step of: mapping a datatype as a dataset of references that point to child nodes of a top levelgroup.
 21. The method of claim 20, wherein the data type is a cell arrayand wherein each element of the cellarray is stored in one of the childnodes.
 22. The method of claim 1, wherein the step of mapping furthercomprises mapping at least one user defined data type to the scientificdata format.
 23. The method of claim 1, wherein the step of mappingfurther comprises mapping one or more data types to the scientific dataformat.
 24. The method of claim 23, where at least one data type is atleast one of: a struct, a function handle, an opaque type, and an objectoriented data type.
 25. A system for storing or loading data in anarray-based computing environment, the system comprising: a workspacefor containing data generated in the array-based computing environment,a repository for storing the data in a scientific data format; and aschema for mapping types of the data generated in the array-basedcomputing environment to corresponding data types of the scientific dataformat, wherein values of fields shared in the scientific data format.26. The system of claim 25, further comprising: a plurality of schemasfor mapping types of the data in the array based computing environmentto a plurality of scientific data formats; and a user interface forenabling a user to select one of the plurality of schemas.
 27. Thesystem of claim 25, further comprising: an API for enabling a user toaccess the data in the scientific data format.
 28. The system of claim25, wherein the repository comprises at least one of a file, a database,memory and storage.
 29. The system of claim 25, wherein the datacomprises time series data encapsulated in an object.
 30. The system ofclaim 25, wherein at least a subset of the array-based computinglanguage comprises MATLAB®-compatible commands.
 31. The system of claim25, wherein the scientific data format file comprises a HierarchicalData Format, Version 5 (HDF5) file.
 32. The system of claim 25, whereinthe data comprises variables having a property specifying a format forstoring the variables.
 33. The system of claim 25, wherein the workspacecomprises named and hierarchical workspaces.
 34. A medium holdingcomputer executable instructions for representing data from anarray-based computing language, comprising: providing data in the arraybased computing language; mapping types of the data in the array basedcomputing language to a scientific data format; and allowing for sharingof values of fields in the scientific data format.
 35. The medium ofclaim 34, further comprising: storing data in a scientific data format.36. The medium of claim 35, wherein the data is stored in a repositoryincluding at least one of a file, a database, memory and storage. 37.The medium of claim 36 wherein the file has a .MAT extension.
 38. Themedium of claim 36, wherein the data is encrypted when stored in therepository.
 39. The medium of claim 34, wherein the data comprises timeseries data encapsulated in an object.
 40. The medium of claim 34,wherein at least a subset of the array-based computing languagecomprises MATLAB®-compatible commands.
 41. The medium of claim 34,wherein the scientific data format file comprises a Hierarchical DataFormat, Version 5 (HDF5) file.
 42. The medium of claim 34, wherein thedata comprises variables having a property specifying a format forstoring the variables.
 43. The medium of claim 34, further comprising:loading data from an HDF file created from data in the array-basedcomputing language into a workspace of a computing environment.
 44. Themedium of claim 43, wherein a portion of the data locked for reading orwriting, wherein the locked portion is specific to the data that arestored or loaded.
 45. The medium of claim 43, wherein more than one readprocess read different portions of the data and builds an in-memoryrepresentation of the data.
 46. The medium of claim 43, wherein a stateof the loaded data is substantially the same as a state of the data inthe workspace before the data is stored in the scientific data formatfile.
 47. The medium of claim 34, wherein a dimension of the data isreversed to store the data in the scientific data format file.
 48. Themedium of claim 34, wherein it is determined whether the data is to bestored in a little or big endian representation.
 49. The medium of claim34, wherein a sparse array is mapped as a group in the scientific dataformat.
 50. The medium of claim 49, wherein the sparse array is mappedas three datasets, each dataset having one dimensional dataspace. 51.The medium of claim 49, wherein the sparse array is represented incanonical form and stored as one dataset having one dimensionaldataspace.
 52. The medium of claim 34, wherein an empty array is mappedas a dataset with an attribute indicating that the array is empty. 53.The medium of claim 34, wherein a data type is mapped as a dataset ofreferences that point to child nodes of a top level group.
 54. Themedium of claim 53, wherein the data type is a cell array and whereineach element of the cellarray is stored in one of the child nodes. 55.The medium of claim 34, wherein at least one user defined data type ismapped to the scientific data format.
 56. The medium of claim 34,wherein one or more data types are mapped to the scientific data format.57. The medium of claim 56, where at least one data type is at least oneof: a struct, a function handle, an opaque type, and an object orienteddata type.
 58. A system for storing or loading data, the systemcomprising: an array-based computing environment for providing data, aschema for mapping types of the data generated in the array-basedcomputing environment to corresponding data types of the scientific dataformat file; and a storage unit for storing the data in a scientificdata format file, wherein values of fields shared in the scientific dataformat.
 59. The system of claim 58, wherein the storage unit is internalor external to the array-based computing environment.